Safety

Overview

The safety policy can be used to detect user-provided inputs or model responses that contain unsafe content, such as obscene, toxic, or harmful content.

Safety Policy Definition

Dynamo AI uses the following policy definition for detecting unsafe content.

Name: Prohibit Unsafe Content

Description: Permit prompts that seek responsible, educational information on sensitive topics while prohibiting prompts that promote, facilitate, or normalize harmful content. Harmful content is defined as any material that encourages, instructs, or glorifies actions detrimental to individuals, groups, or society. This includes content involving illegal or criminal activities, violence, harassment, hate speech, discrimination, self-harm, sexual exploitation, child safety violations, privacy breaches, misinformation, dangerous substances, weapons, and malware—as well as any language that is vulgar, abusive, manipulative, or derogatory toward protected characteristics.

Allowed Behaviors:

Inquiry-based learning: Prompts seeking educational information on sensitive topics, such as historical events, social issues, or scientific concepts, for the purpose of learning and understanding.
Responsible discussion: Queries that facilitate respectful and open discussions on sensitive topics like violence, discrimination, and harmful substances 3.General knowledge queries: Prompts seeking information on everyday topics, such as news, entertainment, or general knowledge, that do not promote or facilitate harm.
Personalized advice: Queries seeking advice or guidance on personal issues, such as mental health, relationships, or career development, that do not promote or facilitate harm.
Responsible Sexual Education: Requests for responsible sexual education, health, or consent
Advice on Managing Conflicts: Queries on resources for managing conflicts and reporting harassment
Default allowance: All prompts are allowed unless explicitly prohibited by this policy, ensuring users can engage with the AI system without unnecessary restrictions.
Mental Health Information: Requests for factual resources for mental health

Prohibited Behaviors:

Promoting Harm: Queries that encourage or glorify violent, harassing, or discriminatory behavior towards individuals or groups
Illicit Activities: Requests for information or guidance on illegal or criminal activities, including those related to dangerous substances, creating or obtaining weapons, or malware
Hate Speech and Harassment: Inputs that contain hate speech, derogatory language, or manipulative content targeting protected characteristics. Includes abusive, biased, or unfair remarks
Exploitation and Abuse: Queries that promote or facilitate sexual exploitation or child safety violations
Self-Harm and Violence: Requests or guidance on self-harm, suicidal behavior, or violent acts
Malware and Privacy Violation Topics: Requests, guidance, or attempts at promoting malware or privacy violations
Toxic Language: Vulgar, hateful, or offensive language
Explicit Sexual Content: explicit description of sexual activities, nudity, and pornographic content.
Sexist or Racist Content: sexist or racists commentary including implicit or explicit discrimination or stereotyping.

Safety Policy Actions

You can manage what happens to inputs and outputs when applying the safety policy using the actions below:

Flag: flag content for moderator review
Block: block user inputs or model outputs containing unsafe content

Overview​

Safety Policy Definition​

Safety Policy Actions​

Overview

Safety Policy Definition

Safety Policy Actions